Zipf's Law and Random Texts

نویسندگان

  • Ramon Ferrer-i-Cancho
  • Ricard V. Solé
چکیده

Random-text models have been proposed as an explanation for the power law relationship between word frequency and rank, the so-called Zipf’s law. They are generally regarded as null hypotheses rather than models in the strict sense. In this context, recent theories of language emergence and evolution assume this law as a priori information with no need of explanation. Here, random texts and real texts are compared through (a) the so-called lexical spectrum and (b) the distribution of words having the same length. It is shown that real texts fill the lexical spectrum much more efficiently and regardless of the word length, suggesting that the meaningfulness of Zipf’s law is high.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

BACKGROUND Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random...

متن کامل

Zipf's law and the structure and evolution of languages

By using a vast number of examples in social and economical data including natural languages, George Zipf was able to show an amazingly robust functional form of the rank-frequency plots 11, f 1=r f for frequency, r for rank, now commonly called Zipf's curve or Zipf's law. George Miller, a renowned linguist, summarized this study in 1965: Faced with this massive statistical regularity, you have...

متن کامل

Random texts exhibit Zipf's-law-like word frequency distribution

It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to it...

متن کامل

Large-Scale Analysis of Zipf’s Law in English Texts

Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf's law in texts as anecdotic. We try to solve these issu...

متن کامل

Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Advances in Complex Systems

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2002